Formalizing knowledge used in spectrogram reading: acoustic and perceptual evidence from stops

نویسنده

  • Lori Lamel
چکیده

Since the invention of the sound spectrograph in 1946 by Koenig, Dunn and Lacey, spectrograms have been widely used for speech research. Over the last decade there has been revived interest in the application of spectrogram reading toward continuous speech recognition. Spectrogram reading involves interpreting the acoustic patterns in the image to determine the spoken utterance. One must selectively attend to many different acoustic cues, interpret their significance in light of other evidence, and make inferences based on information from multiple sources. While early attempts at spectrogram reading met with limited success (Klatt and Stevens, 1973; Lindblom and Svenssen, 1973; Svenssen, 1974), Zue, in a series of experiments intended to illustrate the richness of phonetic information in the speech signal (Cole et al., 1980; Cole and Zue, 1980), demonstrated that high performance phonetic labeling of a spectrogram could be obtained. In this thesis a formal evaluation of spectrogram reading was conducted in order to obtain a better understanding of the process and to evaluate the ability of spectrogram readers. The research consisted of three main parts: an evaluation of spectrogram readers on a constrained task, a comparison to listeners on the same task, and a formalization of spectrogram-reading knowledge in a rule-based system. The performance of 5 spectrogram readers was assessed using speech from 299 talkers. The readers identified stop consonants which were extracted from continuous speech and presented in the immediate phonemic context. The task was designed so that lexical and other higher sources of knowledge could not be used. The averaged identification rate of the ranged across contexts, from 73-82% top choice, and 77-93% for the top two choices. The performance of spectrogram readers was, on the average, 10% below that of human listeners on the same task. Listeners had an overall identification rate that ranged from 85 to 97%. The performance of readers is comparable to other spectrogram reading experiments reported in the literature, however the other studies have typically evaluated a single subject on speech spoken by a small number of talkers. Although researchers have suggested that the process can be described in terms of rules (Zue, 1981), few compilations of rules or strategies exist (Rothenberg, 1963; Fant, 1968, Svenssen, 1974). In order to formalize the information used in spectrogram reading, a system for identifying stop consonants was developed. A knowledge-based system was chosen because the expression and use of the knowledge is explicit. The emphasis was on capturing the acoustic descriptions and modeling the reasoning thought to be used

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using decision trees to construct optimal acoustic cues

This paper presents an approach to the optimization of acoustic cues used for stop identi cation in the context of an acoustic-phonetic decoding system which uses automatic acoustic event extractors (a formant tracking algorithm and a burst analyzer). The acoustic cues have been designed on the basis of acoustic studies on stops and spectrogram reading experiments. This ensures that these cues ...

متن کامل

Capturing Breathy Voice: Durational Measures of Oral Stops in Marathi

The present study investigates a series of techniques used to capture the durational differences of oral stops in Marathi, an Indic language that exhibits a four-way phonemic distinction among oral stops. Like many of its Indic relatives, Marathi utilizes both voicing and aspiration to achieve oral stop contrasts, yielding aspirated and plain versions of both voiced and voiceless stops. While v...

متن کامل

A speech recognition strategy based on making acoustic evidence and phonetic knowledge explicit

We describe a prototype implementation of a representational approach to acoustic-phonetics in knowledge-based speech recognition. Our scheme is based on the 'Speech Sketch', a structure which enables acoustic evidence and phonetic knowledge to be represented in similar ways, so that like can be compared with like. The process of building the Speech Sketch begins with spectrogram image processi...

متن کامل

Exploring the Meaning of Quality from Urban Space Users’ Viewpoint by Analyzing Conceptual Environment Codes

The main purpose of urban design is to create good and high-quality urban spaces and environments for people to live while such quality may not be determined only by imposing a structural, perceptual and value system of the designer. It can be said that human and his powers to perceive surrounding environments are the focus of urban design. Having reviewed previous researches and theories in re...

متن کامل

Acoustic Analysis of Persian EFL Learners' Pronunciation of English Vowels

This paper reports the results of an experimental study on non-native production of English vowels. Two groups of Persian EFL learners varying in language proficiency were tested on their ability to produce the nine plain vowels of American English. Vowel production accuracy was assessed by means of acoustic measurements. Ladefoged and Maddison’s (1996) F1 F2 measurements for American English v...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1988